Skip to main content

Internet Archive Web Crawls

The Internet Archive discovers and captures web pages through many different web crawls. At any given time several distinct crawls are running, some for months, and some every day or longer. View the web archive through the Wayback Machine.

972,373
RESULTS
rss


Media Type
109
collections
971,060
web
1,204
data
Topics & Subjects
843,790
crawldata
2,263
no404
1,452
wikipedia
811
wordpress
252
amazonbooks
6
end of term
More right-solid
Collection
972,373
Internet Archive Web Crawls
957,922
Web Crawls
494,437
Worldwide Web Crawls
293,743
Youtube Videos
73,846
YouTube 2007 Crawl
71,730
Wide Crawl Number 14 started March 2016
More right-solid
Creator
837,698
internet archive
73,846
internetarchive
32,949
alexa internet
31
google, inc.
3
lekash@archive.org
3
ximm@archive.org
More right-solid
Language
3,914
English
SHOW DETAILS
up-solid down-solid
Prior Page
eye
Title
Date Archived
Creator
Live Web Proxy Crawls
web
eye 2.6M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Wed Mar 30 03:11:49 PDT 2011 to Wed Mar 30 10:24:04 PDT 2011.
Topic: crawldata
2004 Indian Ocean earthquake and tsunami
collection
42
ITEMS
2.6M
VIEWS
collection
eye 2.6M
Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
Live Web Proxy Crawls
web
eye 2.5M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2013-03-29T09:35:58 UTC to 2013-03-29T15:03:34 UTC.
Topic: crawldata
Top 150 Crawl
collection
30
ITEMS
2.4M
VIEWS
collection
eye 2.4M
Top 150 Alexa sites crawl performed by Internet Archive. This data is currently not publicly accessible.
Data crawled by Internet Archive on behalf of Internet Archive from Sun Aug 14 21:06:52 PDT 2005 to Sun Nov 20 23:12:32 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web
eye 2.4M
favorite 0
comment 0
UNT Web
collection
35
ITEMS
2.2M
VIEWS
collection
eye 2.2M
This collection contains all collaborative crawl data contributed by University of North Texas (UNT).
Topics: UNT, web, texas, eot
Live Web Proxy Crawls
web
eye 2.2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live3.us.archive.org from 2013-07-11T00:40:27 UTC to 2013-07-11T05:26:31 UTC.
Topic: crawldata
UK Government Site Crawl
collection
105
ITEMS
2.2M
VIEWS
collection
eye 2.2M
Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
End Of Term 2016 Pre-Inauguration Crawls
collection
4,693
ITEMS
2.1M
VIEWS
collection
eye 2.1M
This collection contains web crawls performed as the pre-inauguration crawl for part of the End of Term Web Archive, a collaborative project that aims to preserve the U.S. federal government web presence at each change of administration. Content includes publicly-accessible government websites hosted on .gov, .mil, and relevant non-.gov domains, as well as government social media materials. The web archiving was performed in the Fall and Winter of 2016 to capture websites prior to the January...
Topics: end of term, federal government, 2016, president, congress
Live Web Proxy Crawls
web
eye 2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Fri May 20 12:11:05 PDT 2011 to Fri May 20 16:12:31 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Tue Mar 29 14:27:39 PDT 2011 to Tue Mar 29 20:01:59 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2013-01-09T09:11:28 UTC to 2013-01-09T20:54:17 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Fri May 20 23:16:11 PDT 2011 to Sat May 21 00:20:39 PDT 2011.
Topic: crawldata
Wide Crawl started February 2014
web
eye 2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Data crawled by Sloan Foundation on behalf of Internet Archive from Wed Aug 01 19:54:22 PDT 2007 to Wed Aug 01 22:21:06 PDT 2007
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sun Nov 16 11:09:15 PDT 2003 to Tue Jun 17 19:02:43 PDT 2003
Topic: crawldata
Live Web Proxy Crawls
web
eye 2M
favorite 0
comment 0
Wide Crawl started February 2014
web
eye 2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.9M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 1.9M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Thu Mar 31 04:44:35 PDT 2011 to Thu Mar 31 13:13:56 PDT 2011.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Mar 05 13:09:57 PDT 2005 to Sat Oct 29 13:54:47 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Tue Mar 08 20:18:06 PDT 2005 to Tue Sep 27 05:06:47 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Fri Sep 26 12:13:03 PDT 2003 to Wed Dec 10 23:25:51 PDT 2003
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Wed Feb 19 08:18:23 PST 2014 to Wed Feb 19 04:21:37 PST 2014.
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.9M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2013-01-09T16:00:53 UTC to 2013-01-10T01:01:51 UTC.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Tue Feb 18 22:58:46 PST 2014 to Tue Feb 18 19:25:19 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.8M
favorite 1
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Sat Feb 8 03:46:42 PST 2014 to Fri Feb 7 23:17:16 PST 2014.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Sep 18 12:46:38 PDT 2004 to Thu May 05 09:34:36 PDT 2005
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.8M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl414.us.archive.org:wide from Sat Feb 8 04:46:28 PST 2014 to Sat Feb 8 00:01:23 PST 2014.
Topic: crawldata
Survey Crawl December 2014
web
eye 1.8M
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl428.us.archive.org:survey from Thu Jan 8 02:39:52 PST 2015 to Thu Jan 8 02:17:55 PST 2015.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Thu Nov 27 00:45:40 PDT 2003 to Sat Mar 29 12:46:47 PDT 2003
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Mar 29 18:35:25 PDT 2003 to Sun Nov 16 08:00:57 PDT 2003
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Wed May 04 22:04:03 PDT 2005 to Tue Oct 25 16:31:50 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Fri Aug 12 02:13:26 PDT 2005 to Thu Dec 22 02:45:53 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.7M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Fri Apr 1 18:45:57 PDT 2011 to Sat Apr 2 02:17:10 PDT 2011.
Topic: crawldata
Survey Crawl
web
eye 1.7M
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl835.us.archive.org:survey from Sun Jan 10 18:55:42 PST 2016 to Sun Jan 10 11:20:26 PST 2016.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Mon Jun 16 20:25:23 PDT 2003 to Wed Nov 26 17:48:35 PDT 2003
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Wed Apr 21 08:36:57 PDT 2004 to Fri Dec 03 12:31:33 PDT 2004
Topic: crawldata
Wide Crawl started January 2012
web
eye 1.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Jan 21 04:01:50 PST 2012 to Fri Jan 20 21:01:34 PST 2012.
Topic: crawldata
Wayback Robots Crawl
collection
129
ITEMS
1.6M
VIEWS
collection
eye 1.6M
Wayback robots.txt crawl performed by Internet Archive. This data is currently not publicly accessible.
Data crawled by Internet Archive on behalf of Internet Archive from Tue Mar 30 20:14:48 PDT 2004 to Wed Nov 10 07:03:02 PDT 2004
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Thu Jun 19 05:30:16 PDT 2003 to Wed Nov 26 23:13:24 PDT 2003
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Tue Jun 17 21:04:33 PDT 2003 to Sat Nov 29 14:39:46 PDT 2003
Topic: crawldata
Data crawled by Sloan Foundation on behalf of Internet Archive from Mon Jul 09 02:04:40 PDT 2007 to Mon Jul 09 04:14:42 PDT 2007
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.6M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Wed Mar 30 17:32:13 PDT 2011 to Wed Mar 30 21:33:25 PDT 2011.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Wed May 03 07:47:56 PDT 2006 to Fri Sep 17 05:50:57 PDT 2004
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Mon May 22 18:18:03 PDT 2006 to Sun Feb 13 14:37:12 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Mon Aug 02 08:33:20 PDT 2004 to Thu Jun 02 08:07:17 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Fri Apr 11 22:06:37 PDT 2003 to Sat Feb 21 23:23:08 PDT 2004
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Oct 02 20:44:43 PDT 2004 to Sun Mar 06 13:36:26 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Mon Apr 05 02:11:21 PDT 2004 to Thu Jan 20 07:31:13 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Wed Sep 01 16:50:05 PDT 2004 to Tue Mar 22 03:58:17 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Nov 29 18:46:37 PDT 2003 to Fri Sep 26 07:58:46 PDT 2003
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.4M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live4.us.archive.org from 2013-08-11T22:09:31 UTC to 2013-08-12T14:50:34 UTC.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Wed Jan 14 19:09:33 PDT 2004 to Wed Sep 01 08:23:46 PDT 2004
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Tue Dec 23 01:33:17 PDT 2003 to Mon Sep 20 03:00:39 PDT 2004
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Tue Sep 14 02:29:34 PDT 2004 to Sat Apr 16 09:35:33 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.4M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Thu Mar 31 20:25:07 PDT 2011 to Fri Apr 1 01:20:45 PDT 2011.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Wed Feb 08 00:51:13 PDT 2006 to Fri Aug 13 01:23:55 PDT 2004
Topic: crawldata
Wide Crawl started April 2013
web
eye 1.4M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Wed May 15 12:25:51 PDT 2013 to Wed May 15 06:56:55 PDT 2013.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Tue Mar 07 12:09:55 PDT 2006 to Tue Jan 25 01:44:23 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.4M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2013-01-09T20:20:38 UTC to 2013-01-10T08:41:03 UTC.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Thu Feb 24 15:17:36 PDT 2005 to Fri Jul 30 08:44:09 PDT 2004
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Wed Sep 01 16:58:53 PDT 2004 to Tue Mar 08 11:00:13 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.3M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live4.us.archive.org from 2013-07-11T01:38:55 UTC to 2013-07-11T08:44:51 UTC.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sun Aug 29 05:03:17 PDT 2004 to Sat Mar 05 01:09:59 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Oct 09 08:01:30 PDT 2004 to Sun Apr 17 12:25:15 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.3M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2013-01-30T14:01:07 UTC to 2013-01-30T19:57:29 UTC.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Thu Mar 18 16:15:34 PDT 2004 to Tue Nov 23 08:52:34 PDT 2004
Topic: crawldata
Live Web Proxy Crawls
web
eye 1.3M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Sat Apr 2 09:19:41 PDT 2011 to Sat Apr 2 16:22:14 PDT 2011.
Topic: crawldata
Survey Crawl April 2013
web
eye 1.3M
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl420.us.archive.org:survey from Mon Dec 23 10:49:15 PST 2013 to Mon Dec 23 05:58:22 PST 2013.
Topic: crawldata